Skip to content

Conversation

derekmeegan
Copy link
Contributor

why

LiteLLM's synchronous completion() method was blocking the event loop in async handlers, preventing concurrent execution of multiple LLM calls. This caused performance degradation when multiple operations needed to run in parallel.

what changed

  • Converted LLMClient.create_response() from sync to async method using litellm.acompletion()
  • Updated inference.observe() and inference.extract() functions to be async
  • Modified all handlers (ObserveHandler, ExtractHandler) to await async inference calls
  • Updated mock LLM client's create_response() method to be async for test compatibility

test plan

  • run CI tests and inspect that things are working as expected

@derekmeegan derekmeegan merged commit 3bcdd05 into main Sep 25, 2025
13 checks passed
@derekmeegan derekmeegan deleted the derek/make_litellm_async branch September 25, 2025 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants